-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[LifetimeSafety] Introduce intra-procedural analysis in Clang #142313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c8f6277
to
3624359
Compare
✅ With the latest revision this PR passed the Python code formatter. |
668e329
to
7b929ee
Compare
136238f
to
942648e
Compare
942648e
to
cb7bd7f
Compare
cb7bd7f
to
0cd187b
Compare
@llvm/pr-subscribers-clang-analysis Author: Utkarsh Saxena (usx95) ChangesThis patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291 The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches. Key Components
Next Steps(Not covered in this PR but planned for subsequent patches) The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
<details> Performance on pathological test cases: </summary> The pathological case arise when we have struct MyObj {
int id;
~MyObj() {}
};
void long_cycle(bool condition) {
MyObj v1{1};
MyObj v2{1};
MyObj v3{1};
MyObj v4{1};
MyObj* p1 = &v1;
MyObj* p2 = &v2;
MyObj* p3 = &v3;
MyObj* p4 = &v4;
while (condition) {
MyObj* temp = p1;
p1 = p2;
p2 = p3;
p3 = p4;
p4 = temp;
}
} </details> Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff 5 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+ AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
FixitUtil.cpp
IntervalPartition.cpp
IssueHash.cpp
+ LifetimeSafety.cpp
LiveVariables.cpp
MacroExpansionContext.cpp
ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+ const clang::CFGBlock *Block;
+ /// Index into Block->Elements().
+ unsigned ElementIndex;
+
+ Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+ : Block(B), ElementIndex(Idx) {}
+
+ bool operator==(const Point &Other) const {
+ return Block == Other.Block && ElementIndex == Other.ElementIndex;
+ }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+ const clang::ValueDecl *D;
+
+ enum class Kind : uint8_t {
+ StackVariable,
+ Heap, // TODO: Handle.
+ Field, // TODO: Handle.
+ ArrayElement, // TODO: Handle.
+ TemporaryObject, // TODO: Handle.
+ StaticOrGlobal, // TODO: Handle.
+ };
+
+ Kind PathKind;
+
+ Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+ /// TODO: Represent opaque loans.
+ /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+ /// is represented as empty LoanSet
+ LoanID ID;
+ Path SourcePath;
+ SourceLocation IssueLoc;
+
+ LoanInfo(LoanID id, Path path, SourceLocation loc)
+ : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+ OriginID ID;
+ OriginKind Kind;
+ union {
+ const clang::ValueDecl *Decl;
+ const clang::Expr *Expression;
+ };
+ OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+ : ID(id), Kind(kind), Decl(D) {}
+ OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+ : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+ LoanManager() = default;
+
+ LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+ NextLoanIDVal++;
+ AllLoans.emplace_back(NextLoanIDVal, path, loc);
+ return AllLoans.back();
+ }
+
+ const LoanInfo *getLoanInfo(LoanID id) const {
+ if (id < AllLoans.size())
+ return &AllLoans[id];
+ return nullptr;
+ }
+ llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+ LoanID NextLoanIDVal = 0;
+ llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+ OriginManager() = default;
+
+ OriginID getNextOriginID() { return NextOriginIDVal++; }
+ OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+ assert(D != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::Variable, D);
+ return AllOrigins.back();
+ }
+ OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+ assert(E != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+ return AllOrigins.back();
+ }
+
+ OriginID getOrCreate(const Expr *E) {
+ auto It = ExprToOriginID.find(E);
+ if (It != ExprToOriginID.end())
+ return It->second;
+
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+ // Origin of DeclRefExpr is that of the declaration it refers to.
+ return getOrCreate(DRE->getDecl());
+ }
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, E);
+ ExprToOriginID[E] = NewID;
+ return NewID;
+ }
+
+ const OriginInfo *getOriginInfo(OriginID id) const {
+ if (id < AllOrigins.size())
+ return &AllOrigins[id];
+ return nullptr;
+ }
+
+ llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+ OriginID getOrCreate(const ValueDecl *D) {
+ auto It = DeclToOriginID.find(D);
+ if (It != DeclToOriginID.end())
+ return It->second;
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, D);
+ DeclToOriginID[D] = NewID;
+ return NewID;
+ }
+
+private:
+ OriginID NextOriginIDVal = 0;
+ llvm::SmallVector<OriginInfo> AllOrigins;
+ llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+ llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+ enum class Kind : uint8_t {
+ /// A new loan is issued from a borrow expression (e.g., &x).
+ Issue,
+ /// A loan expires as its underlying storage is freed (e.g., variable goes
+ /// out of scope).
+ Expire,
+ /// An origin is propagated from a source to a destination (e.g., p = q).
+ AssignOrigin,
+ /// An origin is part of a function's return value.
+ ReturnOfOrigin
+ };
+
+private:
+ Kind K;
+
+protected:
+ Point P;
+ Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+ virtual ~Fact() = default;
+ Kind getKind() const { return K; }
+ Point getPoint() const { return P; }
+
+ template <typename T> const T *getAs() const {
+ if (T::classof(this))
+ return static_cast<const T *>(this);
+ return nullptr;
+ }
+
+ virtual void dump(llvm::raw_ostream &OS) const {
+ OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+ << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+ }
+};
+
+class IssueFact : public Fact {
+ LoanID LID;
+ OriginID OID;
+
+public:
+ IssueFact(LoanID LID, OriginID OID, Point Pt)
+ : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+ LoanID getLoanID() const { return LID; }
+ OriginID getOriginID() const { return OID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+ << ")\n";
+ }
+};
+
+class ExpireFact : public Fact {
+ LoanID LID;
+
+public:
+ ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+ LoanID getLoanID() const { return LID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Expire (LoanID: " << getLoanID() << ")\n";
+ }
+};
+
+class AssignOriginFact : public Fact {
+ OriginID OIDDest;
+ OriginID OIDSrc;
+
+public:
+ AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+ : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+ OriginID getDestOriginID() const { return OIDDest; }
+ OriginID getSrcOriginID() const { return OIDSrc; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::AssignOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "AssignOrigin (DestID: " << getDestOriginID()
+ << ", SrcID: " << getSrcOriginID() << ")\n";
+ }
+};
+
+class ReturnOfOriginFact : public Fact {
+ OriginID OID;
+
+public:
+ ReturnOfOriginFact(OriginID OID, Point Pt)
+ : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+ OriginID getReturnedOriginID() const { return OID; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::ReturnOfOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+ }
+};
+
+class FactManager {
+public:
+ llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end())
+ return It->second;
+ return {};
+ }
+
+ void addFact(const CFGBlock *B, Fact *NewFact) {
+ BlockToFactsMap[B].push_back(NewFact);
+ }
+
+ template <typename FactType, typename... Args>
+ FactType *createFact(Args &&...args) {
+ void *Mem = FactAllocator.Allocate<FactType>();
+ return new (Mem) FactType(std::forward<Args>(args)...);
+ }
+
+ void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+ llvm::dbgs() << "==========================================\n";
+ llvm::dbgs() << " Lifetime Analysis Facts:\n";
+ llvm::dbgs() << "==========================================\n";
+ if (const Decl *D = AC.getDecl()) {
+ if (const auto *ND = dyn_cast<NamedDecl>(D))
+ llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+ }
+ // Print blocks in the order as they appear in code for a stable ordering.
+ ForwardDataflowWorklist worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ worklist.enqueueBlock(B);
+ while (const CFGBlock *B = worklist.dequeue()) {
+ llvm::dbgs() << " Block B" << B->getBlockID() << ":\n";
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end()) {
+ for (const Fact *F : It->second) {
+ llvm::dbgs() << " ";
+ F->dump(llvm::dbgs());
+ }
+ }
+ llvm::dbgs() << " End of Block\n";
+ }
+ }
+
+private:
+ llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+ BlockToFactsMap;
+ llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+ FactManager Facts;
+ LoanManager Loans;
+ OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+ FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+ : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+ void run() {
+ llvm::TimeTraceScope TimeProfile("Fact Generation");
+ // Iterate through the CFG blocks in pre-order to ensure that
+ // initializations and destructions are processed in the correct sequence.
+ // TODO: A pre-order traversal utility should be provided by Dataflow
+ // framework.
+ ForwardDataflowWorklist Worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ Worklist.enqueueBlock(B);
+ while (const CFGBlock *Block = Worklist.dequeue()) {
+ CurrentBlock = Block;
+ for (unsigned I = 0; I < Block->size(); ++I) {
+ const CFGElement &Element = Block->Elements[I];
+ CurrentPoint = Point(Block, I);
+ if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+ Visit(CS->getStmt());
+ else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+ Element.getAs<CFGAutomaticObjDtor>())
+ handleDestructor(*DtorOpt);
+ }
+ }
+ }
+
+ void VisitDeclStmt(const DeclStmt *DS) {
+ for (const Decl *D : DS->decls()) {
+ if (const auto *VD = dyn_cast<VarDecl>(D)) {
+ if (hasOrigin(VD->getType())) {
+ if (const Expr *InitExpr = VD->getInit()) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+ if (!hasOrigin(ICE->getType()))
+ return;
+ // An ImplicitCastExpr node itself gets an origin, which flows from the
+ // origin of its sub-expression (after stripping its own parens/casts).
+ if (ICE->getCastKind() == CK_LValueToRValue) {
+ OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+ OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ IceOID, SubExprOID, CurrentPoint));
+ }
+ }
+
+ void VisitUnaryOperator(const UnaryOperator *UO) {
+ if (UO->getOpcode() == UO_AddrOf) {
+ const Expr *SubExpr = UO->getSubExpr();
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+ if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+ // Check if it's a local variable.
+ if (VD->hasLocalStorage()) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+ Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+ LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+ UO->getOperatorLoc());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<IssueFact>(
+ Loan.ID, OID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitReturnStmt(const ReturnStmt *RS) {
+ if (const Expr *RetExpr = RS->getRetValue()) {
+ if (hasOrigin(RetExpr->getType())) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+ }
+ }
+ }
+
+ void VisitBinaryOperator(const BinaryOperator *BO) {
+ if (BO->isAssignmentOp()) {
+ const Expr *LHSExpr = BO->getLHS();
+ const Expr *RHSExpr = BO->getRHS();
+
+ // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+ // LHS must be a pointer/reference type that can be an origin.
+ // RHS must also represent an origin (either another pointer/ref or an
+ // address-of).
+ if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+ if (const auto *VD_LHS =
+ dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+ VD_LHS && hasOrigin(VD_LHS->getType())) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+private:
+ // Check if a type have an origin.
+ bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+ void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+ /// TODO: Also handle trivial destructors (e.g., for `int`
+ // variables) which will never have a CFGAutomaticObjDtor node.
+ /// TODO: Handle loans to temporaries.
+ const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+ if (!DestructedVD)
+ return;
+ // Iterate through all loans to see if any expire.
+ for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+ const Path &LoanPath = Loan.SourcePath;
+ // Check if the loan is for a stack variable and if that variable
+ // is the one being destructed.
+ if (LoanPath.PathKind == Path::Kind::StackVariable) {
+ if (LoanPath.D == DestructedVD) {
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+ FactsContext &FactsCtx;
+ const CFG &Cfg;
+ AnalysisDeclContext &AC;
+ const CFGBlock *CurrentBlock;
+ Point CurrentPoint;
+};
+
+// ========================================================================= //
+// The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+ OriginLoanMap::Factory OriginMapFact;
+ LoanSet::Factory LoanSetFact;
+
+ LoanSet createLoanSet(LoanID LID) {
+ return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+ }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+ /// The map from an origin to the set of loans it contains.
+ /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+ /// not expressions, because expressions are not visible across blocks.
+ OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+ explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+ LifetimeLattice() = default;
+
+ bool operator==(const LifetimeLattice &Other) const {
+ return Origins == Other.Origins;
+ }
+ bool operator!=(const LifetimeLattice &Other) const {
+ return !(*this == Other);
+ }
+
+ LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+ if (auto *Loans = Origins.lookup(OID))
+ return *Loans;
+ return Factory.LoanSetFact.getEmptySet();
+ }
+
+ /// Computes the union of two lattices by performing a key-wise merge of
+ /// their OriginLoanMaps.
+ LifetimeLattice merge(const LifetimeLattice &Other,
+ LifetimeFactory &Factory) const {
+ /// Merge the smaller map into the larger one ensuring we iterate over the
+ /// smaller map.
+ if (Origins.getHeight() < Other.Origins.getHeight())
+ return Other.merge(*this, Factory);
+
+ OriginLoanMap MergedState = Origins;
+ // For each origin in the other map, union its loan set with ours.
+ for (const auto &Entry : Other.Origins) {
+ OriginID OID = Entry.first;
+ LoanSet OtherLoanSet = Entry.second;
+ MergedState = Factory.OriginMapFact.add(
+ MergedState, OID,
+ merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+ }
+ return LifetimeLattice(MergedState);
+ }
+
+ LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+ /// Merge the smaller set into the larger one ensuring we iterate over the
+ /// smaller set.
+ if (a.getHeight() < b.getHeight())
+ std::swap(a, b);
+ LoanSet Result = a;
+ for (LoanID LID : b) {
+ /// TODO(opt): Profiling shows that this loop is a major performance
+ /// bottleneck. Investigate using a BitVector to represent the set of
+ /// loans for improved merge performance.
+ Result = Factory.LoanSetFact.add(Result, LID);
+ }
+ return Result;
+ }
+
+ void dump(llvm::raw_ostream &OS) const {
+ OS << "LifetimeLattice State:\n";
+ if (Origins.isEmpty())
+ OS << " <empty>\n";
+ for (const auto &Entry : Origi...
[truncated]
|
@llvm/pr-subscribers-clang Author: Utkarsh Saxena (usx95) ChangesThis patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291 The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches. Key Components
Next Steps(Not covered in this PR but planned for subsequent patches) The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
<details> Performance on pathological test cases: </summary> The pathological case arise when we have struct MyObj {
int id;
~MyObj() {}
};
void long_cycle(bool condition) {
MyObj v1{1};
MyObj v2{1};
MyObj v3{1};
MyObj v4{1};
MyObj* p1 = &v1;
MyObj* p2 = &v2;
MyObj* p3 = &v3;
MyObj* p4 = &v4;
while (condition) {
MyObj* temp = p1;
p1 = p2;
p2 = p3;
p3 = p4;
p4 = temp;
}
} </details> Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff 5 Files Affected:
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+ AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
FixitUtil.cpp
IntervalPartition.cpp
IssueHash.cpp
+ LifetimeSafety.cpp
LiveVariables.cpp
MacroExpansionContext.cpp
ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+ const clang::CFGBlock *Block;
+ /// Index into Block->Elements().
+ unsigned ElementIndex;
+
+ Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+ : Block(B), ElementIndex(Idx) {}
+
+ bool operator==(const Point &Other) const {
+ return Block == Other.Block && ElementIndex == Other.ElementIndex;
+ }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+ const clang::ValueDecl *D;
+
+ enum class Kind : uint8_t {
+ StackVariable,
+ Heap, // TODO: Handle.
+ Field, // TODO: Handle.
+ ArrayElement, // TODO: Handle.
+ TemporaryObject, // TODO: Handle.
+ StaticOrGlobal, // TODO: Handle.
+ };
+
+ Kind PathKind;
+
+ Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+ /// TODO: Represent opaque loans.
+ /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+ /// is represented as empty LoanSet
+ LoanID ID;
+ Path SourcePath;
+ SourceLocation IssueLoc;
+
+ LoanInfo(LoanID id, Path path, SourceLocation loc)
+ : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+ OriginID ID;
+ OriginKind Kind;
+ union {
+ const clang::ValueDecl *Decl;
+ const clang::Expr *Expression;
+ };
+ OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+ : ID(id), Kind(kind), Decl(D) {}
+ OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+ : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+ LoanManager() = default;
+
+ LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+ NextLoanIDVal++;
+ AllLoans.emplace_back(NextLoanIDVal, path, loc);
+ return AllLoans.back();
+ }
+
+ const LoanInfo *getLoanInfo(LoanID id) const {
+ if (id < AllLoans.size())
+ return &AllLoans[id];
+ return nullptr;
+ }
+ llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+ LoanID NextLoanIDVal = 0;
+ llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+ OriginManager() = default;
+
+ OriginID getNextOriginID() { return NextOriginIDVal++; }
+ OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+ assert(D != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::Variable, D);
+ return AllOrigins.back();
+ }
+ OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+ assert(E != nullptr);
+ AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+ return AllOrigins.back();
+ }
+
+ OriginID getOrCreate(const Expr *E) {
+ auto It = ExprToOriginID.find(E);
+ if (It != ExprToOriginID.end())
+ return It->second;
+
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+ // Origin of DeclRefExpr is that of the declaration it refers to.
+ return getOrCreate(DRE->getDecl());
+ }
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, E);
+ ExprToOriginID[E] = NewID;
+ return NewID;
+ }
+
+ const OriginInfo *getOriginInfo(OriginID id) const {
+ if (id < AllOrigins.size())
+ return &AllOrigins[id];
+ return nullptr;
+ }
+
+ llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+ OriginID getOrCreate(const ValueDecl *D) {
+ auto It = DeclToOriginID.find(D);
+ if (It != DeclToOriginID.end())
+ return It->second;
+ OriginID NewID = getNextOriginID();
+ addOriginInfo(NewID, D);
+ DeclToOriginID[D] = NewID;
+ return NewID;
+ }
+
+private:
+ OriginID NextOriginIDVal = 0;
+ llvm::SmallVector<OriginInfo> AllOrigins;
+ llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+ llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+ enum class Kind : uint8_t {
+ /// A new loan is issued from a borrow expression (e.g., &x).
+ Issue,
+ /// A loan expires as its underlying storage is freed (e.g., variable goes
+ /// out of scope).
+ Expire,
+ /// An origin is propagated from a source to a destination (e.g., p = q).
+ AssignOrigin,
+ /// An origin is part of a function's return value.
+ ReturnOfOrigin
+ };
+
+private:
+ Kind K;
+
+protected:
+ Point P;
+ Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+ virtual ~Fact() = default;
+ Kind getKind() const { return K; }
+ Point getPoint() const { return P; }
+
+ template <typename T> const T *getAs() const {
+ if (T::classof(this))
+ return static_cast<const T *>(this);
+ return nullptr;
+ }
+
+ virtual void dump(llvm::raw_ostream &OS) const {
+ OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+ << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+ }
+};
+
+class IssueFact : public Fact {
+ LoanID LID;
+ OriginID OID;
+
+public:
+ IssueFact(LoanID LID, OriginID OID, Point Pt)
+ : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+ LoanID getLoanID() const { return LID; }
+ OriginID getOriginID() const { return OID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+ << ")\n";
+ }
+};
+
+class ExpireFact : public Fact {
+ LoanID LID;
+
+public:
+ ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+ LoanID getLoanID() const { return LID; }
+ static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "Expire (LoanID: " << getLoanID() << ")\n";
+ }
+};
+
+class AssignOriginFact : public Fact {
+ OriginID OIDDest;
+ OriginID OIDSrc;
+
+public:
+ AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+ : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+ OriginID getDestOriginID() const { return OIDDest; }
+ OriginID getSrcOriginID() const { return OIDSrc; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::AssignOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "AssignOrigin (DestID: " << getDestOriginID()
+ << ", SrcID: " << getSrcOriginID() << ")\n";
+ }
+};
+
+class ReturnOfOriginFact : public Fact {
+ OriginID OID;
+
+public:
+ ReturnOfOriginFact(OriginID OID, Point Pt)
+ : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+ OriginID getReturnedOriginID() const { return OID; }
+ static bool classof(const Fact *F) {
+ return F->getKind() == Kind::ReturnOfOrigin;
+ }
+ void dump(llvm::raw_ostream &OS) const override {
+ OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+ }
+};
+
+class FactManager {
+public:
+ llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end())
+ return It->second;
+ return {};
+ }
+
+ void addFact(const CFGBlock *B, Fact *NewFact) {
+ BlockToFactsMap[B].push_back(NewFact);
+ }
+
+ template <typename FactType, typename... Args>
+ FactType *createFact(Args &&...args) {
+ void *Mem = FactAllocator.Allocate<FactType>();
+ return new (Mem) FactType(std::forward<Args>(args)...);
+ }
+
+ void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+ llvm::dbgs() << "==========================================\n";
+ llvm::dbgs() << " Lifetime Analysis Facts:\n";
+ llvm::dbgs() << "==========================================\n";
+ if (const Decl *D = AC.getDecl()) {
+ if (const auto *ND = dyn_cast<NamedDecl>(D))
+ llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+ }
+ // Print blocks in the order as they appear in code for a stable ordering.
+ ForwardDataflowWorklist worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ worklist.enqueueBlock(B);
+ while (const CFGBlock *B = worklist.dequeue()) {
+ llvm::dbgs() << " Block B" << B->getBlockID() << ":\n";
+ auto It = BlockToFactsMap.find(B);
+ if (It != BlockToFactsMap.end()) {
+ for (const Fact *F : It->second) {
+ llvm::dbgs() << " ";
+ F->dump(llvm::dbgs());
+ }
+ }
+ llvm::dbgs() << " End of Block\n";
+ }
+ }
+
+private:
+ llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+ BlockToFactsMap;
+ llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+ FactManager Facts;
+ LoanManager Loans;
+ OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+ FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+ : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+ void run() {
+ llvm::TimeTraceScope TimeProfile("Fact Generation");
+ // Iterate through the CFG blocks in pre-order to ensure that
+ // initializations and destructions are processed in the correct sequence.
+ // TODO: A pre-order traversal utility should be provided by Dataflow
+ // framework.
+ ForwardDataflowWorklist Worklist(Cfg, AC);
+ for (const CFGBlock *B : Cfg.const_nodes())
+ Worklist.enqueueBlock(B);
+ while (const CFGBlock *Block = Worklist.dequeue()) {
+ CurrentBlock = Block;
+ for (unsigned I = 0; I < Block->size(); ++I) {
+ const CFGElement &Element = Block->Elements[I];
+ CurrentPoint = Point(Block, I);
+ if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+ Visit(CS->getStmt());
+ else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+ Element.getAs<CFGAutomaticObjDtor>())
+ handleDestructor(*DtorOpt);
+ }
+ }
+ }
+
+ void VisitDeclStmt(const DeclStmt *DS) {
+ for (const Decl *D : DS->decls()) {
+ if (const auto *VD = dyn_cast<VarDecl>(D)) {
+ if (hasOrigin(VD->getType())) {
+ if (const Expr *InitExpr = VD->getInit()) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+ if (!hasOrigin(ICE->getType()))
+ return;
+ // An ImplicitCastExpr node itself gets an origin, which flows from the
+ // origin of its sub-expression (after stripping its own parens/casts).
+ if (ICE->getCastKind() == CK_LValueToRValue) {
+ OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+ OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ IceOID, SubExprOID, CurrentPoint));
+ }
+ }
+
+ void VisitUnaryOperator(const UnaryOperator *UO) {
+ if (UO->getOpcode() == UO_AddrOf) {
+ const Expr *SubExpr = UO->getSubExpr();
+ if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+ if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+ // Check if it's a local variable.
+ if (VD->hasLocalStorage()) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+ Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+ LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+ UO->getOperatorLoc());
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<IssueFact>(
+ Loan.ID, OID, CurrentPoint));
+ }
+ }
+ }
+ }
+ }
+
+ void VisitReturnStmt(const ReturnStmt *RS) {
+ if (const Expr *RetExpr = RS->getRetValue()) {
+ if (hasOrigin(RetExpr->getType())) {
+ OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+ }
+ }
+ }
+
+ void VisitBinaryOperator(const BinaryOperator *BO) {
+ if (BO->isAssignmentOp()) {
+ const Expr *LHSExpr = BO->getLHS();
+ const Expr *RHSExpr = BO->getRHS();
+
+ // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+ // LHS must be a pointer/reference type that can be an origin.
+ // RHS must also represent an origin (either another pointer/ref or an
+ // address-of).
+ if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+ if (const auto *VD_LHS =
+ dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+ VD_LHS && hasOrigin(VD_LHS->getType())) {
+ OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+ OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+ FactsCtx.Facts.addFact(CurrentBlock,
+ FactsCtx.Facts.createFact<AssignOriginFact>(
+ DestOID, SrcOID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+private:
+ // Check if a type have an origin.
+ bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+ void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+ /// TODO: Also handle trivial destructors (e.g., for `int`
+ // variables) which will never have a CFGAutomaticObjDtor node.
+ /// TODO: Handle loans to temporaries.
+ const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+ if (!DestructedVD)
+ return;
+ // Iterate through all loans to see if any expire.
+ for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+ const Path &LoanPath = Loan.SourcePath;
+ // Check if the loan is for a stack variable and if that variable
+ // is the one being destructed.
+ if (LoanPath.PathKind == Path::Kind::StackVariable) {
+ if (LoanPath.D == DestructedVD) {
+ FactsCtx.Facts.addFact(
+ CurrentBlock,
+ FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+ }
+ }
+ }
+ }
+
+ FactsContext &FactsCtx;
+ const CFG &Cfg;
+ AnalysisDeclContext &AC;
+ const CFGBlock *CurrentBlock;
+ Point CurrentPoint;
+};
+
+// ========================================================================= //
+// The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+ OriginLoanMap::Factory OriginMapFact;
+ LoanSet::Factory LoanSetFact;
+
+ LoanSet createLoanSet(LoanID LID) {
+ return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+ }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+ /// The map from an origin to the set of loans it contains.
+ /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+ /// not expressions, because expressions are not visible across blocks.
+ OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+ explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+ LifetimeLattice() = default;
+
+ bool operator==(const LifetimeLattice &Other) const {
+ return Origins == Other.Origins;
+ }
+ bool operator!=(const LifetimeLattice &Other) const {
+ return !(*this == Other);
+ }
+
+ LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+ if (auto *Loans = Origins.lookup(OID))
+ return *Loans;
+ return Factory.LoanSetFact.getEmptySet();
+ }
+
+ /// Computes the union of two lattices by performing a key-wise merge of
+ /// their OriginLoanMaps.
+ LifetimeLattice merge(const LifetimeLattice &Other,
+ LifetimeFactory &Factory) const {
+ /// Merge the smaller map into the larger one ensuring we iterate over the
+ /// smaller map.
+ if (Origins.getHeight() < Other.Origins.getHeight())
+ return Other.merge(*this, Factory);
+
+ OriginLoanMap MergedState = Origins;
+ // For each origin in the other map, union its loan set with ours.
+ for (const auto &Entry : Other.Origins) {
+ OriginID OID = Entry.first;
+ LoanSet OtherLoanSet = Entry.second;
+ MergedState = Factory.OriginMapFact.add(
+ MergedState, OID,
+ merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+ }
+ return LifetimeLattice(MergedState);
+ }
+
+ LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+ /// Merge the smaller set into the larger one ensuring we iterate over the
+ /// smaller set.
+ if (a.getHeight() < b.getHeight())
+ std::swap(a, b);
+ LoanSet Result = a;
+ for (LoanID LID : b) {
+ /// TODO(opt): Profiling shows that this loop is a major performance
+ /// bottleneck. Investigate using a BitVector to represent the set of
+ /// loans for improved merge performance.
+ Result = Factory.LoanSetFact.add(Result, LID);
+ }
+ return Result;
+ }
+
+ void dump(llvm::raw_ostream &OS) const {
+ OS << "LifetimeLattice State:\n";
+ if (Origins.isEmpty())
+ OS << " <empty>\n";
+ for (const auto &Entry : Origi...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a very rudimentary first pass, will take a second look a bit later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I don't have any major comments on the design, I have mostly some nits inline.
return; | ||
// An ImplicitCastExpr node itself gets an origin, which flows from the | ||
// origin of its sub-expression (after stripping its own parens/casts). | ||
if (ICE->getCastKind() == CK_LValueToRValue) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test so I can see how this works?
int a;
int *p = &a;
int **pp = &p;
int *q = *pp;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. I missed to mention the plans to handle this.
It doesn't work right now because the analysis gets tripped up by both address-of &p
and the reborrowing *pp
.
My plan is to fix this by moving from a single origin per pointer type to a list of origins.
The idea is to build up the origin list based on the type:
&p
We start with p
(type int*
), which has its own origin list, [O_p]
. Taking the address &
then prepends a new origin that holds a loan on the variable p
itself. The resulting origin list for the expression &p
(type int**
) is therefore [O_&p, O_p]
.
*pp
The origin list for the dereference *pp
is simply the tail of pp
's list, which is [O_p]
.
subtyping : e.g. assignment
int **pp = &p;
This is checked using a slightly more complex subtyping rule for the origin lists. For an assignment where the LHS
has origin list [O1, O2]
and the RHS
has [O3, O4]
, the subtyping enforces:
O3
is a subset ofO1
. (Standard subtyping for the outer pointer).O2
is an alias ofO4
. A stricter, invariant relationship for the inner pointers. Polonius also handles these by creating two subset constraints:O2 ⊆ O4
andO4 ⊆ O2
essentially making them equivalent.
struct LifetimeLattice { | ||
/// The map from an origin to the set of loans it contains. | ||
/// TODO(opt): To reduce the lattice size, propagate origins of declarations, | ||
/// not expressions, because expressions are not visible across blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because expressions are not visible across blocks
This is not entirely true, e.g., f(cond ? *p : *q)
; Here, *p
and *q
are not declarations but they are visible across blocks. That being said, there are a limited number of ways this can happen, so we still can make the optimization suggested here. It is just a bit more elaborate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?
Good point. This is actually going to be a part of the error reporting. And yes, polonius does have a notion of live origins/loans.
I prefer approach 1 as this more directly works on the invalid memory as compared to origins carrying invalid loans to a potential use. |
This patch is already big enough. But I think a small comment of how reporting will work down the line might be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, looks very good. I would recommend you consider splitting this into multiple smaller PRs, so reviews can focus on a specific aspect. E.g. the FactGenerator and the LifetimeDataflow seems to deserve focused reviews.
struct LifetimeLattice { | ||
/// The map from an origin to the set of loans it contains. | ||
/// TODO(opt): To reduce the lattice size, propagate origins of declarations, | ||
/// not expressions, because expressions are not visible across blocks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.
/// Orchestrates the analysis by iterating over the CFG using a worklist | ||
/// algorithm. It computes a fixed point by propagating the LifetimeLattice | ||
/// state through each block until the state no longer changes. | ||
/// TODO: Maybe use the dataflow framework! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the current form, where you use CFGBlocks to drive the computation, using the framework would make sense. But, if you're attached to the current algorithm (comparing at the before state rather than the after state), you'll need to implement it as an alternative (or modify the current one), as we discussed in our last meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed few comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work! Interested to see this feature in clang.
Left a couple of nit comments on code style.
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
7c6ca5d
to
634ba4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some discussions open on this PR e.g., whether access paths need kinds. But I don't think those are blockers, this is exploratory work and it is expected that some design decisions might be revisited along the way when we learn something new. I am OK with the current status of this PR.
Something for future PRs: I think it would be nice to have some sort of coverage statistics. While initially you do not aim to cover all of C++, it would be good to know what portion of nodes are missing from the fact generation when we run this on some real world code. That can give us some progress indicators.
✅ With the latest revision this PR passed the C/C++ code formatter. |
bdec502
to
5f672e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something for future PRs: I think it would be nice to have some sort of coverage statistics. While initially you do not aim to cover all of C++, it would be good to know what portion of nodes are missing from the fact generation when we run this on some real world code. That can give us some progress indicators.
Good idea. I was thinking along similar lines and have some ideas for initial metrics:
- Use an RAV to visit all expressions in a function and count number of expressions which should have an origin (i.e., it is of pointer type) but does not have an origin associated with the expression.
- Count origins that are created but are never associated with a Loan; this would be a signal that we don't understand the pointer's source.
I will add these in a follow-up patch.
The assertion will be enabled once we have enough C++ coverage
In 1-2 days, I will go ahead and merge this if there are no more comments from other reviewers. cc: @ymand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some nits, but LG!
This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291
The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.
Key Components
Loan
,Origin
, andPath
to model memory borrows and the lifetime of pointers.llvm-lit
tests validate the analysis by checking the generated facts.Next Steps
(Not covered in this PR but planned for subsequent patches)
The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:
Origin
to the set ofLoans
it may contain at any given program point.[[clang::lifetimebound]]
annotations and by conservatively handling opaque/un-annotated functions.Performance on pathological test cases:
The pathological case arise when we have
N
origins initially holdingN
loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches afterN
iterations when all the origins contain all the loans.For
N
= 4, the test case would like like: